1 About the Dataset

Description: The data consist of 200 subjects from a larger study on the survival of patients following admission to an adult intensive care unit (ICU). The study used logistic regression to predict the probability of survival for these patients until their discharge from the hospital. The dependent variable is the binary variable Vital Status (STA). Nineteen possible predictor variables, both discrete and continuous, were also observed. Number of cases: 200 Variable Names:

  1. ID: ID number of the patient
  2. STA: Vital status (0 = Lived, 1 = Died)
  3. AGE: Patient’s age in years
  4. SEX: Patient’s sex (0 = Male, 1 = Female)
  5. RACE: Patient’s race (1 = White, 2 = Black, 3 = Other)
  6. SER: Service at ICU admission (0 = Medical, 1 = Surgical)
  7. CAN: Is cancer part of the present problem? (0 = No, 1 = Yes)
  8. CRN: History of chronic renal failure (0 = No, 1 = Yes)
  9. INF: Infection probable at ICU admission (0 = No, 1 = Yes)
  10. CPR: CPR prior to ICU admission (0 = No, 1 = Yes)
  11. SYS: Systolic blood pressure at ICU admission (in mm Hg)
  12. HRA: Heart rate at ICU admission (beats/min)
  13. PRE: Previous admission to an ICU within 6 months (0 = No, 1 = Yes)
  14. TYP: Type of admission (0 = Elective, 1 = Emergency)
  15. FRA: Long bone, multiple, neck, single area, or hip fracture (0 = No, 1 = Yes)
  16. PO2: PO2 from initial blood gases (0 = >60, 1 = ²60)
  17. PH: PH from initial blood gases (0 = ³7.25, 1 <7.25)
  18. PCO: PCO2 from initial blood gases (0 = ²45, 1 = >45)
  19. BIC: Bicarbonate from initial blood gases (0 = ³18, 1 = <18)
  20. CRE: Creatinine from initial blood gases (0 = ²2.0, 1 = >2.0)
  21. LOC: Level of consciousness at admission (0 = no coma or stupor, 1= deep stupor, 2 = coma)

2 Reading in the Data set

> # ICU <- read.table("C:/Users/ekene/OneDrive - McMaster University/Avenue2Learn_Winter2020/EH 705/eh705termproject/ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> ICU <- read.table("./ICUAdmissions.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
> str(ICU)
'data.frame':   200 obs. of  21 variables:
 $ ID           : int  8 12 14 28 32 38 40 41 42 50 ...
 $ Status       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Age          : int  27 59 77 54 87 69 63 30 35 70 ...
 $ Sex          : int  1 0 0 0 1 0 0 1 0 1 ...
 $ Race         : int  1 1 1 1 1 1 1 1 2 1 ...
 $ Service      : int  0 0 1 0 1 0 1 0 0 1 ...
 $ Cancer       : int  0 0 0 0 0 0 0 0 0 1 ...
 $ Renal        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Infection    : int  1 0 0 1 1 1 0 0 0 0 ...
 $ CPR          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Systolic     : int  142 112 100 142 110 110 104 144 108 138 ...
 $ HeartRate    : int  88 80 70 103 154 132 66 110 60 103 ...
 $ Previous     : int  0 1 0 0 1 0 0 0 0 0 ...
 $ Type         : int  1 1 0 1 1 1 0 1 1 0 ...
 $ Fracture     : int  0 0 0 1 0 0 0 0 0 0 ...
 $ PO2          : int  0 0 0 0 0 1 0 0 0 0 ...
 $ PH           : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PCO2         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Bicarbonate  : int  0 0 0 0 0 1 0 0 0 0 ...
 $ Creatinine   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Consciousness: int  1 1 1 1 1 1 1 1 1 1 ...

3 Statistical Summary of the dataset variables

> summary(ICU)
       ID            Status         Age             Sex            Race      
 Min.   :  4.0   Min.   :0.0   Min.   :16.00   Min.   :0.00   Min.   :1.000  
 1st Qu.:210.2   1st Qu.:0.0   1st Qu.:46.75   1st Qu.:0.00   1st Qu.:1.000  
 Median :412.5   Median :0.0   Median :63.00   Median :0.00   Median :1.000  
 Mean   :444.8   Mean   :0.2   Mean   :57.55   Mean   :0.38   Mean   :1.175  
 3rd Qu.:671.8   3rd Qu.:0.0   3rd Qu.:72.00   3rd Qu.:1.00   3rd Qu.:1.000  
 Max.   :929.0   Max.   :1.0   Max.   :92.00   Max.   :1.00   Max.   :3.000  
    Service          Cancer        Renal         Infection         CPR       
 Min.   :0.000   Min.   :0.0   Min.   :0.000   Min.   :0.00   Min.   :0.000  
 1st Qu.:0.000   1st Qu.:0.0   1st Qu.:0.000   1st Qu.:0.00   1st Qu.:0.000  
 Median :1.000   Median :0.0   Median :0.000   Median :0.00   Median :0.000  
 Mean   :0.535   Mean   :0.1   Mean   :0.095   Mean   :0.42   Mean   :0.065  
 3rd Qu.:1.000   3rd Qu.:0.0   3rd Qu.:0.000   3rd Qu.:1.00   3rd Qu.:0.000  
 Max.   :1.000   Max.   :1.0   Max.   :1.000   Max.   :1.00   Max.   :1.000  
    Systolic       HeartRate         Previous         Type      
 Min.   : 36.0   Min.   : 39.00   Min.   :0.00   Min.   :0.000  
 1st Qu.:110.0   1st Qu.: 80.00   1st Qu.:0.00   1st Qu.:0.000  
 Median :130.0   Median : 96.00   Median :0.00   Median :1.000  
 Mean   :132.3   Mean   : 98.92   Mean   :0.15   Mean   :0.735  
 3rd Qu.:150.0   3rd Qu.:118.25   3rd Qu.:0.00   3rd Qu.:1.000  
 Max.   :256.0   Max.   :192.00   Max.   :1.00   Max.   :1.000  
    Fracture          PO2             PH             PCO2      Bicarbonate   
 Min.   :0.000   Min.   :0.00   Min.   :0.000   Min.   :0.0   Min.   :0.000  
 1st Qu.:0.000   1st Qu.:0.00   1st Qu.:0.000   1st Qu.:0.0   1st Qu.:0.000  
 Median :0.000   Median :0.00   Median :0.000   Median :0.0   Median :0.000  
 Mean   :0.075   Mean   :0.08   Mean   :0.065   Mean   :0.1   Mean   :0.075  
 3rd Qu.:0.000   3rd Qu.:0.00   3rd Qu.:0.000   3rd Qu.:0.0   3rd Qu.:0.000  
 Max.   :1.000   Max.   :1.00   Max.   :1.000   Max.   :1.0   Max.   :1.000  
   Creatinine   Consciousness  
 Min.   :0.00   Min.   :1.000  
 1st Qu.:0.00   1st Qu.:1.000  
 Median :0.00   Median :1.000  
 Mean   :0.05   Mean   :1.125  
 3rd Qu.:0.00   3rd Qu.:1.000  
 Max.   :1.00   Max.   :3.000  

3.1 Observation

From the ICU Admissions dataset, I made the following observations; 1. Most of the variables are integer but from information about the data, most of the variables can be recoded to factor variables.
2. There are only four variables that can be left as numerical variables others can be recoded to categorical/factor variables
2. There would be a need to recode the categorical variables to factors.
3. The dependent variable is the binary variable Vital Status (Status).
4. Nineteen possible predictor variables, both discrete and continuous, were also observed.
5. There are no missing data

4 Converting Numerical variables to Factor Variables

Labelling the factor levels helps with comparative analysis and visualization

> ICU <- within(ICU, {
+   Status <- factor(Status, labels=c('Lived','Died'))
+   Sex <- factor(Sex, labels=c('Male','Female'))
+   Race <- factor(Race, labels=c('White','Black','Other'))
+   Service <- factor(Service, labels=c('Medical','Surgical'))
+   Cancer <- factor(Cancer, labels=c('No','Yes'))
+   Renal <- factor(Renal, labels=c('No','Yes'))
+   Infection <- factor(Infection, labels=c('No','Yes'))
+   CPR <- factor(CPR, labels=c('No','Yes'))
+   Previous <- factor(Previous, labels=c('No','Yes'))
+   Type <- factor(Type, labels=c('Elective','Emergency'))
+   Fracture <- factor(Fracture, labels=c('No','Yes'))
+   PCO2 <- factor(PCO2, labels=c('No','Yes'))
+   PH <- factor(PH, labels=c('No','Yes'))
+   PO2 <- factor(PO2, labels=c('No','Yes'))
+   Bicarbonate <- factor(Bicarbonate, labels=c('No','Yes'))
+   Creatinine <- factor(Creatinine, labels=c('No','Yes'))
+   Consciousness <- factor(Consciousness, labels=c('Conscious','Deep Stupor','Coma'))
+ })

5 Viewing a few rows of the recorded dataset

> headTail(ICU) %>% datatable(rownames = TRUE, filter="top", options = list(pageLenght = 10, scrollX=T))%>% formatRound(columns=c(1:17), digits=0)

6 Saving new recoded dataset

> # write.csv(ICU, file="ICUAdmissions_recoded.csv", row.names=FALSE)

7 Data Exploration - Statistical summaries and Graphing of variables

7.1 Statistical summary of the recoded dataset

> summary(ICU)
       ID          Status         Age            Sex         Race    
 Min.   :  4.0   Lived:160   Min.   :16.00   Male  :124   White:175  
 1st Qu.:210.2   Died : 40   1st Qu.:46.75   Female: 76   Black: 15  
 Median :412.5               Median :63.00                Other: 10  
 Mean   :444.8               Mean   :57.55                           
 3rd Qu.:671.8               3rd Qu.:72.00                           
 Max.   :929.0               Max.   :92.00                           
     Service    Cancer    Renal     Infection  CPR         Systolic    
 Medical : 93   No :180   No :181   No :116   No :187   Min.   : 36.0  
 Surgical:107   Yes: 20   Yes: 19   Yes: 84   Yes: 13   1st Qu.:110.0  
                                                        Median :130.0  
                                                        Mean   :132.3  
                                                        3rd Qu.:150.0  
                                                        Max.   :256.0  
   HeartRate      Previous         Type     Fracture   PO2        PH     
 Min.   : 39.00   No :170   Elective : 53   No :185   No :184   No :187  
 1st Qu.: 80.00   Yes: 30   Emergency:147   Yes: 15   Yes: 16   Yes: 13  
 Median : 96.00                                                          
 Mean   : 98.92                                                          
 3rd Qu.:118.25                                                          
 Max.   :192.00                                                          
  PCO2     Bicarbonate Creatinine     Consciousness
 No :180   No :185     No :190    Conscious  :185  
 Yes: 20   Yes: 15     Yes: 10    Deep Stupor:  5  
                                  Coma       : 10  
                                                   
                                                   
                                                   
  • Age ranges from 16-92 years old with a mean of 57.55 and median of 63
  • Systolic blood pressure ranges from 36-256 with a mean of 132.3 and median of 130
  • Heart rate ranges from 39-192 beats per minute with a mean of 98.92 and median of 96
> str(ICU)
'data.frame':   200 obs. of  21 variables:
 $ ID           : int  8 12 14 28 32 38 40 41 42 50 ...
 $ Status       : Factor w/ 2 levels "Lived","Died": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age          : int  27 59 77 54 87 69 63 30 35 70 ...
 $ Sex          : Factor w/ 2 levels "Male","Female": 2 1 1 1 2 1 1 2 1 2 ...
 $ Race         : Factor w/ 3 levels "White","Black",..: 1 1 1 1 1 1 1 1 2 1 ...
 $ Service      : Factor w/ 2 levels "Medical","Surgical": 1 1 2 1 2 1 2 1 1 2 ...
 $ Cancer       : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
 $ Renal        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Infection    : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 1 1 1 1 ...
 $ CPR          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Systolic     : int  142 112 100 142 110 110 104 144 108 138 ...
 $ HeartRate    : int  88 80 70 103 154 132 66 110 60 103 ...
 $ Previous     : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 1 1 1 1 1 ...
 $ Type         : Factor w/ 2 levels "Elective","Emergency": 2 2 1 2 2 2 1 2 2 1 ...
 $ Fracture     : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
 $ PO2          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ PH           : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ PCO2         : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Bicarbonate  : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
 $ Creatinine   : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Consciousness: Factor w/ 3 levels "Conscious","Deep Stupor",..: 1 1 1 1 1 1 1 1 1 1 ...

7.2 Numerical Summary of Int VAriables

> numSummary(ICU[,c("Age", "HeartRate", "Systolic"), drop=FALSE], statistics=c("mean", "sd", "IQR", 
+   "quantiles"), quantiles=c(0,.25,.5,.75,1))
             mean       sd   IQR 0%    25% 50%    75% 100%   n
Age        57.545 20.05465 25.25 16  46.75  63  72.00   92 200
HeartRate  98.925 26.82962 38.25 39  80.00  96 118.25  192 200
Systolic  132.280 32.95210 40.00 36 110.00 130 150.00  256 200

8 Graphing

8.1 Plotting the frequency distributon of all factor variables to Status

> p01<-ggplot(ICU, aes(x=Sex )) +
+  geom_bar( fill="pink" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-5, label="Base=200",  size=4, color="black" )  
> 
> p02<-ggplot(ICU, aes(x=Race )) +
+  geom_bar( fill="lightblue" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4) 
>  
> p03<-ggplot(ICU, aes(x=Service )) +
+  geom_bar( fill="blue" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p04 <- ggplot(ICU, aes(x=Cancer )) +
+  geom_bar( fill="Green" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p05 <- ggplot(ICU, aes(x=Renal )) +
+  geom_bar( fill="Orange" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p06 <- ggplot(ICU, aes(x=Infection )) +
+  geom_bar( fill="Red" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p07 <- ggplot(ICU, aes(x=CPR )) +
+  geom_bar( fill="Yellow" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
>  
> p08 <- ggplot(ICU, aes(x=Previous )) +
+  geom_bar( fill="Purple" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> library(Rmisc)
> multiplot(p01, p02, p03, p04, p05, p06, p07, p08,  layout=matrix(c(1:8), nrow=4, byrow=TRUE))

> p09<-ggplot(ICU, aes(x=Type )) +
+  geom_bar( fill="pink" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-5, label="Base=200",  size=4, color="black" )
> 
> p10<-ggplot(ICU, aes(x=Fracture )) +
+  geom_bar( fill="lightblue" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p11<-ggplot(ICU, aes(x=PO2 )) +
+  geom_bar( fill="blue" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p12 <- ggplot(ICU, aes(x=PH )) +
+  geom_bar( fill="Green" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p13 <- ggplot(ICU, aes(x=PCO2 )) +
+  geom_bar( fill="Orange" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p14 <- ggplot(ICU, aes(x=Bicarbonate )) +
+  geom_bar( fill="Red" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> p15 <- ggplot(ICU, aes(x=Creatinine )) +
+  geom_bar( fill="Yellow" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14, angle = 45, hjust = 1 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
>  
> p16 <- ggplot(ICU, aes(x=Consciousness )) +
+  geom_bar( fill="Purple" )  +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=.8, y=-1, label="Base=200",  size=4)
> 
> library(Rmisc)
> multiplot(p09, p10, p11, p12, p13, p14, p15, p16, layout=matrix(c(1:8), nrow=4, byrow=TRUE))

8.2 Distribution of Vital Status by the factor variables

> library(ggplot2)
> f01<-ggplot(ICU, aes(x=Sex, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Sex")
> 
> f02<-ggplot(ICU, aes(x=Race, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Race")
> f03<-ggplot(ICU, aes(x=Service, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Service")
> 
> f04<-ggplot(ICU, aes(x=Cancer, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Cancer")
> 
> library(Rmisc)
> multiplot(f01, f02, f03, f04,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f05<-ggplot(ICU, aes(x=Renal, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Renal")
> 
> f06<-ggplot(ICU, aes(x=Infection, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Infection")
> 
> f07<-ggplot(ICU, aes(x=CPR, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by CPR")
> 
> f08<-ggplot(ICU, aes(x=Previous, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Previous")
> 
> library(Rmisc)
> multiplot(f05, f06, f07, f08,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f09<-ggplot(ICU, aes(x=Type, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Type")
> 
> f10<-ggplot(ICU, aes(x=Fracture, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Fracture")
> 
> f11<-ggplot(ICU, aes(x=PO2, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PO2")
> 
> f12<-ggplot(ICU, aes(x=PH, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PH")
> 
> library(Rmisc)
> multiplot(f09, f10, f11, f12,  layout=matrix(c(1:4), nrow=2, byrow=TRUE))

> f13<-ggplot(ICU, aes(x=PCO2, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by PCO2")
> 
> f14<-ggplot(ICU, aes(x=Bicarbonate, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Bicarbonate")
> 
> f15<-ggplot(ICU, aes(x=Creatinine, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Creatinine")
> 
> f16<-ggplot(ICU, aes(x=Consciousness, fill = Status)) +
+  theme_bw() +
+  geom_bar() +
+  labs(y = "Patient Count",
+       title = "Vital Status by Consciousness")
> 
> 
> library(Rmisc)
> multiplot(f13, f14, f15, f16, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

8.3 Plotting the density Distribution of the Numeric Variable

> d01 <- ggplot(ICU, aes(x=Age)) +
+  geom_density(fill="green") +
+  ggtitle("Age") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d02 <- ggplot(ICU, aes(x=Systolic)) +
+  geom_density(fill="green") +
+  ggtitle("Systolic") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d03 <- ggplot(ICU, aes(x=HeartRate)) +
+  geom_density(fill="green") +
+  ggtitle("HeartRate") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> d04 <- ggplot(ICU, aes(x=ID)) +
+  geom_density(fill="green") +
+  ggtitle("ID") +
+  theme(axis.title.x=element_text(size=16, face="bold", colour="blue")) +
+  theme(axis.text.x=element_text(size=14 )) +
+  annotate("text", x=0.8, y=-0.001, label="Base=315",  size=4)
> 
> multiplot(d01, d02, d03, d04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

8.4 Density plot of Vital Status by Numeric variables

> n01<-ggplot(ICU, aes(x=Age, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status by Age")
> 
> n02<-ggplot(ICU, aes(x=Systolic, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status Systolic")
> 
> n03<-ggplot(ICU, aes(x=HeartRate, fill = Status)) +
+   theme_bw() +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status HeartRate")
> 
> n04<-ggplot(ICU, aes(x=Age, fill = Status)) +
+   theme_bw() +
+   facet_wrap(~ Sex) +
+       geom_density(alpha=0.5) +
+   labs(y = "Density",
+        title = "Density distribution of Vital Status in male and female patients by Age")
> 
> multiplot(n01, n02, n03, n04, layout=matrix(c(1:4), nrow=2, byrow=TRUE))

9 Observations

  • There are more male than female in the population.
  • The number of patient who live is higher than the population who died
  • More Patients from the population aged 50 and above died in both male and female
  • The female population experienced more death than men of the same age bracket as
  • The desity plot is peaked at the top at the age 75 for both male and female meaning more patients aged 75 died.

10 Statistical Analysis

================================================================================

   Variable   p_1 p_10   p_25  p_50   p_75  p_90   p_99
1  Systolic 55.92   92    110   130    150   170 212.12
2       Age 16.99   21  46.75    63     72    78     91
3 HeartRate 45.98   65     80    96 118.25 136.1 162.08
4        ID 11.96 81.3 210.25 412.5 671.75 829.8 924.01

10.1 Distribution

> with(ICU, qqPlot(Systolic, dist="norm", id=list(method="y", n=2, 
+   labels=rownames(ICU)), main="Systolic"))

[1] 200 179
> normalityTest(~Systolic, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  Systolic
W = 0.98369, p-value = 0.0204
> with(ICU, qqPlot(HeartRate, dist="norm", id=list(method="y", n=2, 
+   labels=rownames(ICU)), main="Heartrate"))

[1] 125  48
> normalityTest(~HeartRate, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  HeartRate
W = 0.98598, p-value = 0.04478
> with(ICU, qqPlot(Age, dist="norm", id=list(method="y", n=2, 
+   labels=rownames(ICU)), main="Age"))

[1] 23 97
> normalityTest(~Age, test="shapiro.test", data=ICU)

    Shapiro-Wilk normality test

data:  Age
W = 0.92836, p-value = 2.507e-08

11 Differences in Means

11.1 Summary statistics

> numSummary(ICU[,c("Age", "Systolic", "HeartRate"), drop=FALSE], groups=ICU$Status, statistics=c("mean", "sd", "se(mean)"),quantiles=c(0,.25, .5, .75,1))

Variable: Age 
        mean       sd se(mean)   n
Lived 55.650 20.42818 1.614990 160
Died  65.125 16.64900 2.632438  40

Variable: Systolic 
          mean       sd se(mean)   n
Lived 135.6438 29.80151 2.356016 160
Died  118.8250 41.08084 6.495451  40

Variable: HeartRate 
         mean       sd se(mean)   n
Lived  98.500 26.97868 2.132852 160
Died  100.625 26.49304 4.188918  40
> library(psych)
> describeBy(ICU, ICU$Status)

 Descriptive statistics by group 
group: Lived
               vars   n   mean     sd median trimmed    mad min max range  skew
ID                1 160 457.04 276.35  438.5  454.25 346.19   8 929   921  0.07
Status*           2 160   1.00   0.00    1.0    1.00   0.00   1   1     0   NaN
Age               3 160  55.65  20.43   61.0   56.86  19.27  16  91    75 -0.55
Sex*              4 160   1.38   0.49    1.0    1.34   0.00   1   2     1  0.51
Race*             5 160   1.19   0.50    1.0    1.05   0.00   1   3     2  2.65
Service*          6 160   1.58   0.49    2.0    1.60   0.00   1   2     1 -0.33
Cancer*           7 160   1.10   0.30    1.0    1.00   0.00   1   2     1  2.64
Renal*            8 160   1.07   0.25    1.0    1.00   0.00   1   2     1  3.38
Infection*        9 160   1.38   0.49    1.0    1.34   0.00   1   2     1  0.51
CPR*             10 160   1.04   0.19    1.0    1.00   0.00   1   2     1  4.82
Systolic         11 160 135.64  29.80  132.0  133.97  29.65  48 224   176  0.42
HeartRate        12 160  98.50  26.98   95.0   97.48  25.20  39 192   153  0.44
Previous*        13 160   1.14   0.35    1.0    1.05   0.00   1   2     1  2.01
Type*            14 160   1.68   0.47    2.0    1.73   0.00   1   2     1 -0.77
Fracture*        15 160   1.07   0.26    1.0    1.00   0.00   1   2     1  3.20
PO2*             16 160   1.07   0.25    1.0    1.00   0.00   1   2     1  3.38
PH*              17 160   1.06   0.23    1.0    1.00   0.00   1   2     1  3.82
PCO2*            18 160   1.10   0.30    1.0    1.00   0.00   1   2     1  2.64
Bicarbonate*     19 160   1.06   0.24    1.0    1.00   0.00   1   2     1  3.58
Creatinine*      20 160   1.03   0.17    1.0    1.00   0.00   1   2     1  5.34
Consciousness*   21 160   1.02   0.22    1.0    1.00   0.00   1   3     2  8.69
               kurtosis    se
ID                -1.25 21.85
Status*             NaN  0.00
Age               -0.82  1.61
Sex*              -1.75  0.04
Race*              5.98  0.04
Service*          -1.91  0.04
Cancer*            5.01  0.02
Renal*             9.46  0.02
Infection*        -1.75  0.04
CPR*              21.40  0.02
Systolic           0.34  2.36
HeartRate          0.17  2.13
Previous*          2.06  0.03
Type*             -1.41  0.04
Fracture*          8.27  0.02
PO2*               9.46  0.02
PH*               12.64  0.02
PCO2*              5.01  0.02
Bicarbonate*      10.89  0.02
Creatinine*       26.66  0.01
Consciousness*    74.04  0.02
------------------------------------------------------------ 
group: Died
               vars  n   mean     sd median trimmed    mad min max range  skew
ID                1 40 395.95 250.74    363  386.72 243.89   4 921   917  0.37
Status*           2 40   2.00   0.00      2    2.00   0.00   2   2     0   NaN
Age               3 40  65.12  16.65     68   66.47  11.86  19  92    73 -0.84
Sex*              4 40   1.40   0.50      1    1.38   0.00   1   2     1  0.39
Race*             5 40   1.12   0.46      1    1.00   0.00   1   3     2  3.46
Service*          6 40   1.35   0.48      1    1.31   0.00   1   2     1  0.61
Cancer*           7 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
Renal*            8 40   1.20   0.41      1    1.12   0.00   1   2     1  1.44
Infection*        9 40   1.60   0.50      2    1.62   0.00   1   2     1 -0.39
CPR*             10 40   1.18   0.38      1    1.09   0.00   1   2     1  1.65
Systolic         11 40 118.83  41.08    126  117.22  32.62  36 256   220  0.60
HeartRate        12 40 100.62  26.49     96   99.78  25.20  55 160   105  0.28
Previous*        13 40   1.18   0.38      1    1.09   0.00   1   2     1  1.65
Type*            14 40   1.95   0.22      2    2.00   0.00   1   2     1 -3.98
Fracture*        15 40   1.07   0.27      1    1.00   0.00   1   2     1  3.11
PO2*             16 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
PH*              17 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
PCO2*            18 40   1.10   0.30      1    1.00   0.00   1   2     1  2.57
Bicarbonate*     19 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
Creatinine*      20 40   1.12   0.33      1    1.03   0.00   1   2     1  2.18
Consciousness*   21 40   1.52   0.82      1    1.41   0.00   1   3     2  1.03
               kurtosis    se
ID                -1.00 39.65
Status*             NaN  0.00
Age                0.73  2.63
Sex*              -1.89  0.08
Race*             10.72  0.07
Service*          -1.67  0.08
Cancer*            4.71  0.05
Renal*             0.09  0.06
Infection*        -1.89  0.08
CPR*               0.73  0.06
Systolic           1.40  6.50
HeartRate         -0.75  4.19
Previous*          0.73  0.06
Type*             14.16  0.03
Fracture*          7.85  0.04
PO2*               2.84  0.05
PH*                4.71  0.05
PCO2*              4.71  0.05
Bicarbonate*       2.84  0.05
Creatinine*        2.84  0.05
Consciousness*    -0.74  0.13

11.2 Age by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for Age.

> with(ICU, tapply(Age, Status, var, na.rm=TRUE))
   Lived     Died 
417.3107 277.1891 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Age ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value  Pr(>F)  
group   1   3.127 0.07855 .
      198                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p value = 0.07855 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(Age~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  Age by Status
t = -2.7151, df = 198, p-value = 0.007211
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -16.35688  -2.59312
sample estimates:
mean in group Lived  mean in group Died 
             55.650              65.125 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.007211, 0 is not within the confidence intervals of -16.35688 to -2.59312 and t = -2.7151 is less than -1.967548, we reject the null hypothesis and conclude that the means for Age for the groups Lived and Died are not the same.

11.3 HeartRate by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for HeartRate.

> with(ICU, tapply(HeartRate, Status, var, na.rm=TRUE))
   Lived     Died 
727.8491 701.8814 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(HeartRate ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value Pr(>F)
group   1   0.008  0.929
      198               

Since the p-value = 0.929 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(HeartRate~Status, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  HeartRate by Status
t = -0.44714, df = 198, p-value = 0.6553
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.496845   7.246845
sample estimates:
mean in group Lived  mean in group Died 
             98.500             100.625 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.6553, 0 is within the confidence intervals of -11.496845 to 7.246845 and t = -0.44714 is greater than -1.967548, we retain the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for HeartRate are the same among those that lived and those that died.

11.4 Systolic by Status

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal
Ha: the means are different

The following is the comparison of variances between the two Status groups for Systolic.

> with(ICU, tapply(Systolic, Status, var, na.rm=TRUE))
    Lived      Died 
 888.1301 1687.6353 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Systolic ~ Status, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value  Pr(>F)  
group   1  4.1872 0.04205 *
      198                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value = 0.04205 for testing the homogeneity of variances is less than 0.05, we reject the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are not equal. As such, the Welch two Sample t-test is used to analyze whether there was a significant difference in means.

> t.test(Systolic~Status, alternative="two.sided", conf.level=.95, var.equal=FALSE, data=ICU)

    Welch Two Sample t-test

data:  Systolic by Status
t = 2.4341, df = 49.726, p-value = 0.01856
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  2.938642 30.698858
sample estimates:
mean in group Lived  mean in group Died 
           135.6438            118.8250 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548

As the p-value = 0.01856, 0 is not within the confidence intervals of 2.938642 to 30.698858 and t = 2.4341 is greater than 1.9675, we reject the null hypothesis at a 5% risk level of a type 1 error and conclude that the means for systolic blood pressure are not the same among those that lived and those that died.

11.5 Age by Sex

Ho: the variances for Lived and Died are equal
Ha: the variances are different

Ho: the means are equal Ha: the means are different

The following is the comparison of variances between the two Sex groups for Age.

> with(ICU, tapply(Age, Sex, var, na.rm=TRUE))
    Male   Female 
378.4780 436.5867 

The following code produces the result of the LeveneTest for testing homogeneity of variances.

> leveneTest(Age ~ Sex, data=ICU, center="median")
Levene's Test for Homogeneity of Variance (center = "median")
       Df F value Pr(>F)
group   1  0.1154 0.7344
      198               

Since the p-value = 0.7344 for testing the homogeneity of variances is greater than 0.05, we retain the null hypothesis with a 5% risk of a type 1 error and conclude that the variances for Lived and Died are equal. As such, the Student t-test is used to analyze whether there was a significant difference in means.

> t.test(Age~Sex, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=ICU)

    Two Sample t-test

data:  Age by Sex
t = -1.3582, df = 198, p-value = 0.1759
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.708824  1.789469
sample estimates:
  mean in group Male mean in group Female 
            56.04032             60.00000 
> qt(c(0.025), df=314, lower.tail=TRUE)
[1] -1.967548